473,465 Members | 1,920 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

String comparison problem

Hi,

How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?
I tried to find a way to take the second string and do a replace
whenever such entities are encountered but this implies creating some
sort of lookup table containing not all but a good number of entity
codes. Unless I am mistaken, javascript does not any function to replace
an entity-infested string with a decoded string, pretty much like php's
html_entity_decode. Another way, probably better (but I don't know),
would be to encode the first string.

Any ideas?

Thanks
Jun 1 '07 #1
3 10690
VK
On Jun 2, 2:38 am, Henri <yeah_ri...@donteventry.comwrote:
How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?
Unless there is some Google Groups server "optimization" here, I see
in the first case a word containing character e accent aigue and in
the second case a word containing numeric HTML entity "#233". In such
case these are two completely different issues here.
Javascript operates in Unicode, so it internally sees any string
literal as a Unicode sequence, no matter what the actual page encoding
is. If you need to sort and transform strings according to current
locale, use locale-specific string manipulation methods:
string1.localeCompare(string2)
and
toLocaleLowerCase()
toLocaleUpperCase()

In the second case (with HTML entity) it all depends from were are you
retrieving this string. If you are getting it from the content of a
loaded page, then by the time you are retrieving it the entities are
already parsed so for Javascript it is the same Unicode string as in
the first case, so you don't need to bother with extra transformation.
If it is a string literal "cassé" then obviously for Javascript
it is just a character sequence "c-a-s-s-&-#-2-3-3-;" and it has
nothing to do with "cassé". In this case either use RegExp to replace
entities by custom table; or insert the string into (hidden) HTML
element and read back the parsed value.

Jun 2 '07 #2
VK wrote:
On Jun 2, 2:38 am, Henri <yeah_ri...@donteventry.comwrote:
>How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?

Unless there is some Google Groups server "optimization" here, I see
in the first case a word containing character e accent aigue and in
the second case a word containing numeric HTML entity "#233". In such
case these are two completely different issues here.
Javascript operates in Unicode, so it internally sees any string
literal as a Unicode sequence, no matter what the actual page encoding
is. If you need to sort and transform strings according to current
locale, use locale-specific string manipulation methods:
string1.localeCompare(string2)
and
toLocaleLowerCase()
toLocaleUpperCase()

In the second case (with HTML entity) it all depends from were are you
retrieving this string. If you are getting it from the content of a
loaded page, then by the time you are retrieving it the entities are
already parsed so for Javascript it is the same Unicode string as in
the first case, so you don't need to bother with extra transformation.
If it is a string literal "cassé" then obviously for Javascript
it is just a character sequence "c-a-s-s-&-#-2-3-3-;" and it has
nothing to do with "cassé". In this case either use RegExp to replace
entities by custom table; or insert the string into (hidden) HTML
element and read back the parsed value.
That's the case and I've started experimenting with the replace
function. Calling, for instance, str.replace(/é/,"é") does produce
a "normalized" string. I have to generalize this in order to be able to
take into account most accented characters.
Thank you for your response.
Jun 2 '07 #3
VK wrote:
On Jun 2, 2:38 am, Henri <yeah_ri...@donteventry.comwrote:
>How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?

Unless there is some Google Groups server "optimization" here, I see
in the first case a word containing character e accent aigue and in
the second case a word containing numeric HTML entity "#233". In such
case these are two completely different issues here.
Javascript operates in Unicode, so it internally sees any string
literal as a Unicode sequence, no matter what the actual page encoding
is. If you need to sort and transform strings according to current
locale, use locale-specific string manipulation methods:
string1.localeCompare(string2)
and
toLocaleLowerCase()
toLocaleUpperCase()

In the second case (with HTML entity) it all depends from were are you
retrieving this string. If you are getting it from the content of a
loaded page, then by the time you are retrieving it the entities are
already parsed so for Javascript it is the same Unicode string as in
the first case, so you don't need to bother with extra transformation.
If it is a string literal "cassé" then obviously for Javascript
it is just a character sequence "c-a-s-s-&-#-2-3-3-;" and it has
nothing to do with "cassé". In this case either use RegExp to replace
entities by custom table; or insert the string into (hidden) HTML
element and read back the parsed value.
To replace an entity-encoded string by it's decoded equivalent:

String.prototype.normalize = function() {

return this.replace(/&#([0-9]{1,7});/,
function (str, p1, p2, offset, s) {
return String.fromCharCode(p1);
}
);

}

if s = "cassé" then using s.normalize() returns "cassé"

Henri
Jun 2 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Neil Zanella | last post by:
Hello, Consider the following program. There are two C style string stack variables and one C style string heap variable. The compiler may or may not optimize the space taken up by the two stack...
51
by: Alan | last post by:
hi all, I want to define a constant length string, say 4 then in a function at some time, I want to set the string to a constant value, say a below is my code but it fails what is the correct...
46
by: yadurajj | last post by:
Hello i am newbie trying to learn C..I need to know about string comparisons in C, without using a library function,...recently I was asked this in an interview..I can write a small program but I...
4
by: Dim | last post by:
I found that C# has some buggy ways to process string across methods. I have a class with on global string var and a method where i add / remove from this string Consider it a buffer... with some...
19
by: David zhu | last post by:
I've got different result when comparing two strings using "==" and string.Compare(). The two strings seems to have same value "1202002" in the quick watch, and both have the same length 7 which I...
5
by: BILL | last post by:
Hi Everyone, I've been looking through these .NET groups and can't find the exact answer I want, so I'm asking. Can someone let me know the best way (you feel) to search a C# string for an...
5
by: MaSTeR | last post by:
Can anyone provide a practical short example of why in C# I shouldn't compare two strings with == ? If I write this in JAVA String string1 = "Widget"; if (string1 == "Widget") ...
26
by: Neville Lang | last post by:
Hi all, I am having a memory blank at the moment. I have been writing in C# for a number of years and now need to do something in VB.NET, so forgive me such a primitive question. In C#, I...
12
by: ujjc001 | last post by:
Here's one for ya. I want to create a relational operator from a string object, i.e. I want to somehow be able to say: string opString = ">="; int i1 = "20"; int i2 = "10"; if (i1...
6
by: aznimah | last post by:
hi, i'm work on image comparison. i'm using the similarity measurement which i need to: 1) convert the image into the binary form since the algorithm that i've use works with binary data for the...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.